Hidden Markov Models for Earthquake Prediction: GaussianHMM Analysis
Overall summary
Main Idea:
- split data into training (80%) and testing (20%) chronologically
- train Gausssian HMM based on the training data
- use best performing HMM parameter (based on log-likelihood, L) to forecast values
- the best Xtrain is based on the time series cross validation with
n_splits=3
- otherwise it was always choosing the highest number fo states and fitting to it
- Get the L,AIC,BIC of the best performing model on the testing data
L:AIC:BIC:9755.1887553488548635.96955891877812679.881862766959
- Use the HMM to fit states on the testing data based on feature thresholds and differing criteria
- criteria that fits the states on the test features the best is chosen based on most accuracy
- Compare fitted value against forecasted values to get the following:
For reference, the training metrics were:
L:AIC:BIC:−5590.134681766242511552.26936353248512679.881862766959
Tried with univariate models on just the counts for Poisson (and also Gaussian):
Poisson:
L:AIC:BIC:−2536.01452007942635434.0290401588536403.977432758818
Gaussian:
L:AIC:BIC:5456.240959411002−10230.481918822004−8403.120582487261
HOWEVER, there are couple of issues with just using univariate data based on counts of above a threshold
- Loss of risk-critical information: A univariate count model ignores magnitude, depth, location, and energy release patterns that are essential for assessing actual financial/infrastructure risk
- a single magnitude 8.0 earthquake has vastly different hazard than three magnitude 6.0 events, despite both scenarios having different "counts"
- No actionable insights for risk mitigation: Risk modelling or portfolio optimisation models will need to understand what types of seismic conditions drive risk (shallow vs deep events, proximity to assets, energy buildup patterns) to make informed hedging or allocation decisions - a simple count prediction provides no guidance on which geographic regions, asset types, or time horizons require different risk premiums
The HMM pipeline is given in much more detail below:
1. Gaussian Hidden Markov Models (GaussianHMM)
1.1 Mathematical Foundation
A Hidden Markov Model (HMM) is a statistical model that assumes the system being modeled is a Markov process with unobserved (hidden) states. For earthquake prediction, we use GaussianHMM where the observable features follow a multivariate Gaussian distribution.
Model Components
A GaussianHMM is characterized by:
- Number of hidden states: N
- Initial state probabilities: π={πi} where πi=P(q1=Si)
- State transition probabilities: A={aij} where aij=P(qt+1=Sj∣qt=Si)
- Emission probabilities: Gaussian distributions for each state
For a sequence of observations O={o1,o2,...,oT} and hidden states Q={q1,q2,...,qT}:
State Transition Probability:
P(qt+1=j∣qt=i)=aij
Emission Probability (Gaussian):
For state i, the observation ot follows a multivariate Gaussian distribution:
bi(ot)=P(ot∣qt=i)=N(ot;μi,Σi)
Where:
N(ot;μi,Σi)=(2π)d/2∣Σi∣1/21exp(−21(ot−μi)TΣi−1(ot−μi))
- μi: mean vector for state i
- Σi: covariance matrix for state i
- d: dimensionality of observations
Joint Probability
The joint probability of observations and hidden states:
P(O,Q∣λ)=πq1t=1∏T−1aqtqt+1t=1∏Tbqt(ot)
Where λ={π,A,B} represents the model parameters.
Forward-Backward Algorithm
Forward variable:
αt(i)=P(o1,o2,...,ot,qt=i∣λ)
Recursive computation:
αt+1(j)=[i=1∑Nαt(i)aij]bj(ot+1)
Backward variable:
βt(i)=P(ot+1,ot+2,...,oT∣qt=i,λ)
Likelihood computation:
P(O∣λ)=i=1∑NαT(i)
1.2 Feature Engineering for Earthquake Data
Our GaussianHMM uses the following earthquake features:
- Inter-event Time: Time between consecutive earthquakes (days)
- Depth: Earthquake depth (km)
- Magnitude: Earthquake magnitude (Richter scale)
- Time Since Magnitude 6.5+: Days since last major earthquake
- Energy: Seismic energy release: E=101.5M+4.8 Joules
- Distance from Reference: Haversine distance from reference point (24.785°N, 121.006°E)
Energy Budget Computation
Energy budget analysis using rolling window approach:
Energy Budget Ratio=Eˉ×w∑i=t−wtEi
Where:
- Ei: Energy of earthquake i
- Eˉ: Average daily energy release
- w: Window size (30 days)
2. Validation and Testing Methodology
2.1 Model Evaluation Metrics
Log-Likelihood:
L(λ)=logP(O∣λ)=logQ∑P(O,Q∣λ)
Akaike Information Criterion (AIC):
AIC=2k−2L(λ)
Bayesian Information Criterion (BIC):
BIC=klog(n)−2L(λ)
Where:
- k: Number of model parameters
- n: Number of observations
- For GaussianHMM: k=N(N−1)+(N−1)+Nd+Nd=N2+2Nd−1
2.1.2 Mean Squared Error (MSE)
For forecasting evaluation:
MSE=T1t=1∑T∥xt−x^t∥2
Where xt is actual observation and x^t is forecasted value.
2.2 State Validation Methods
2.2.1 Threshold-Based State Categorization
States are categorized using statistical bounds:
Lower Thresholdi=μi−2σi
Upper Thresholdi=μi+2σi
Categorization Criterion:
An observation is assigned to state j if at least η×100% of its features fall within the thresholds:
Assigned State=argjmax{D∑d=1DI[μj,d−2σj,d≤xd≤μj,d+2σj,d]≥η}
Where I[⋅] is the indicator function and η∈[0.5,1.0] is the criteria threshold.
Accuracy: Percentage of correctly classified states
Precision, Recall, F1-score: Standard classification metrics
Confusion Matrix: Cross-tabulation of predicted vs actual states
2.3 Forecasting Validation
2.3.1 State-Based Forecasting
For each hidden state i, the forecasted observation is:
x^t=μi where i=argjmaxP(qt=j∣O1:t−1)
2.3.2 Temporal Analysis
- Hidden State Time Series: Analysis of state transitions over time
- State Persistence: Duration analysis of consecutive state occurrences
- Transition Patterns: Identification of common state transition sequences
2.4 Severe Event Prediction
Severe Event Definition:
Severe Event={10if (M≥6 and D≤100) or (M≥5.5 and D≤10)otherwise
Where M is magnitude and D is depth in kilometers.
2.5 Cross-Validation Strategy
- Temporal Split: Training on historical data, testing on recent events
- Walk-Forward Validation: Incremental training with rolling prediction windows
- State Consistency Check: Verification that similar conditions yield similar states
3. Implementation Details
3.1 Model Training Process
- Data Preprocessing: Feature scaling and normalization
- Model Selection: Grid search over number of states (2-10)
- Parameter Optimization: Expectation-Maximization algorithm
- Convergence Criteria: Log-likelihood improvement threshold
- Model Selection: Best model based on BIC/AIC scores
3.2 Validation Pipeline
- Load trained model: Best performing GaussianHMM from training phase
- Feature computation: Extract same features from test data
- State fitting: Fit the state on the test data based on the extracted features and thresholds of the given features
- State comparison: Compare the fitted states against the predicted states (using ViterBi algorithm)
- Confusion matrix: Compute confusion matrix based on the plot